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ABSTRACT 



A complex system is considered in its latter stages of development. 
N mission trials have been observed, each resulting in a success or a 
failure. Each failure occurs in one of k failure modes. For each 
failure mode that is observed action is taken to attempt to correct 
that type of failure. The probabilities of correcting the various 
failure modes are known. After corrective action is completed attempts 
to estimate the current reliability, without further sampling, are 
made. A brief historical summary of this problem to date is given. 

Justification for assuming a prior distribution on the failure 
modes is discussed and the posterior distribution of the parameters 
is developed. An intuitive measure of the current reliability is 
stated and certain properties of this random variable are developed. 
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1. INTRODUCTION 



In the latter stages of development and testing of a complex 
system it is reasonable to assume that a certain number of mission 
trials have been effected, and of these some number resulted in the 
failure of the mission. An intuitive measure of the reliability of 
the system at this point would be the ratio of failures to the total 
number of tests conducted. After analysis of the failure data however, 
it is conceivable that the reasons for each failure of the mission 
could be detected and some corrective action taken against this type 
of failure. After taking corrective action the natural step would be 
to continue testing and make statements about the reliability of the 
revised system on the basis of new data. 

If the cost of additional testing is high, if additional test units 
are not available, or if time is a prohibitive factor, further testing 
may not be feasible. At this point Corcoran, Weingarten and Zehna [ l] 
pose the following problem : "Assuming that we have confidence in our 

knowledge of the effectiveness of the contemplated corrective action 
which may be taken on observed failure modes, how should we use the 
results of the first N tests to draw inferences about the current reli- 
ability?" The problem is specifically structured in this mannei • Let 
N tests be conducted. Each of the tests results either in a successful 
performance of the mission or in a failure. Each failure results from 
a failure of one or k possible modes. The probability of a successful 

j.L 

mission is q , the probability of a mission failing by the 1 failure 
o 

mode is q. where i ranges from one to k. The q's themselves are 
unknown; however, since each trial must result in a success of some 
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type of failure it may be noted that 

k 

Q,-l. 

1 



Similarly if we let be the number of events of the i^ 1 type observed 

where N is the number of successful missions then 
o 

k 

Nj * N 
i “ o 

It is also assumed that there is a known conditional probability a of 
correcting the i^ failure mode given that it occurs. 

Based on the above formulation Corcoran, Weingarten, and Zehna [l] 
define the following random variable as a measure of current reliability: 



k 

p* « + y ^ where = 

i - 1 



0 if N 



a if N 
i i 



** 0 
> 0 



This is stated to be an intuitive measure of the reliability since it 
adds a weighted amount of the failure probability of each observed 
failure mode to the initial reliability. The expected value of p* may 
be computed and is referred to as the M raean reliability . 11 This quantity 

is shown to be: 

k 

E[p*] = q + T a. q. [1 - (1-q.) ] 

O L-> 1 1 1 

i = 1 

Since p* is a random variable it is customarily not estimated. However, 
the variance of p* tends to zero as N tends to infinity; therefore any 
estimate of E[p*] can be said to asymptotically estimate p*. The above 
authors^ postulate seven estimators of E[p*] and discuss their relative 
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merits from a standpoint of bias. The last two estimators, quoted 
here for convenience, are conservative (underestimate E[p*]) and are 
asymptotically unbiased. The last estimator, p^, is the more con- 
servative of the two but is also consistent for E[p*]. 



N 



P, = 



N 



k 

♦Z-i 

i=l 



N. 



N 



where z = \ 
i 



if > 1 
otherwise 



P 
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N £ (N - 1) 

_° + V y . I 

N ^ 1 N 

i=l 



where y 

i 



a if N > 0 
i i 



0 if N. = 0 

w i 

The bias for estimator p is shown not to exceed .01 for an N of 25 

6 

and a q as large as one third, 
o 

Larson [3] considers the conditional distribution of the reliability 
(conditioned on the outcome of the test) and demonstrates its probability 
mass function. Unfortunately while the mass function is known, the 
actual values the random variables take on are unknown since they are 
functions of the q^'s. A functional lower bound on the true reliability 
is shown to be: 



p[z * f|A] s p Cg ;> f| A ] P[z S g | aD 

m 

where the conditioning event A is the event that 0, f is 

a function of the observable random variables and g is some function of 

the unknown parameters. P [g ^ f | a] is a conditional multinomial 

m 

probability statement and P[z ^ g |aD is a statement from the distribution 

of true reliability conditioned on sample results. The expression 

P[Z * g|A] can be evaluated for a certain class of g functions^ - ; thus a 

Parson, H. J. , Conditional Distribution of True Reliability after 
Corrective Action (US Naval Postgraduate School. Technical Report/Research 
Paper No. 61, 1966) p. 10. 
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lower bound on the reliability can be obtained if the appropriate 
conditional multinomial statement can be derived. 

This approach has been considered by the present investigator but 
abandoned due to the compounding complexity in the probability state- 
ments. The class of functions of the parameters (q^'s) is restrictive 
since the P[z s g Ia] must be capable of evaluation. No suitable 
functions of the observable random variables have been found that lead 
to direct evaluation of the multinomial statement. The conditional 
distribution of the sample can be obtained but it becomes exceedingly 
complex as the number of failure modes increases. As an example if we 
consider only three modes of failure and N^) and the conditioning 

event A 53 (n > 0. N 2 > 0, = 0} then the conditional probability is 

as follows: 

P[N q = a, = b, N^= N-(a+b) | a] 

, a b N 

n! q Q qi q 2 

a! b! [N-(a+b)].' [(q +q +q ) N +q -(q +q ) -(q +q ) ] 

o i o ol o2 

1 0 < a < N-2 
for < 

l 0 < b < N-a 

= 0 otherwise 

It would seem that any direct attempt to derive the distribution of 
some function of the N.'s is hopeless. Even if the distribution of 
the multinomial statement can be derived for some class of f and g 
functions there is no guarantee that these choices of functions will 
produce a useful bound on the reliability. The right hand side of the 
confidence interval statement is multiplicative and hence both 
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P [g s f | aD and p[z s g|A] must be capable of simultaneously yielding 
m 

high numerical values at the sample point. This of course places 
further restrictions on the class of functions available. 
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2. BAYESIAN VIEWPOINT 



In both previously cited works [l, 3] the probabilities of failure 
for each of the various modes are considered unknown parameters. In 
certain cases it may be feasible to assume a prior distribution on the 
failure modes themselves. As a specific example if the system consists 
of k components in logical series one failure mode may be associated 
with each component. Since earlier developmental history should be 
available on each component it is reasonable to assume that something 
is known about the reliability of each. On the other hand if each mode 
of failure is to be some type of failure (electrical, mechanical, etc.), 
it still may be reasonable to assume that there exists some prior 
knowledge of these failure rates. 

In considering priors in general there are several requirements 
that must be met. The marginal range of each of the random variables 
must be the interval zero to one. Each of the component reliabilities 
will be close to one; hence the distribution chosen for a failure mode 
must be capable of lumping a good percentage of its probability near 
zero. The prior should lead to easy computation if possible or at 
least be tractable. 

Three forms of prior distributions were considered. 



(1) f 





0 otherwise 



( 2 ) 




G q q m . . . q. m O^q £ b^ 1/k i=l,*''k 

2 1 2 k i 



0 



otherwise 
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(3) f 



Q o o ( VV- q k> 
Vn> q k 



m o _1 n»i - l m k _1 



G ~q q, 

3 o 1 



where 



u 

1*0 A 



In all three cases q represents the probability of a successful mission 

o 

and q represents the probability of the i fc failure mode as before, 
i 

The G ' s are appropriate constants. 

i 

The first two priors assume independence between the various 
failure modes. Unfortunately the posterior for each of these distri- 
butions is difficult to demonstrate. The k fold integral appearing 
in the denominator of each posterior is non-integrable and numerical 
techniques would have to be resorted to if it were to be evaluated. 

This was not deemed worthwhile since the third form of the prior is 
actually a more general representation than the first two. It is true 
that (3) disallows independence between failure modes but no strong 
argument can be made for this independence. The last prior will be 
discussed more thoroughly and developed in the next section. 



11 



3. PRIOR DISTRIBUTION OF COMPONENT FAILURES 



If we consider a series of N tests as before with q as the 

o 

probability of a success on any given trial and q , i=l,2,***k, as the 

i 

probability of a failure by the i*-“ failure mode, then the probability 
of observing N Q successes and N^ failures, 1=1,2, •••k, in the respective 
failure modes will follow the multinomial probability law. 

Thus, 



PtN o ,N l , ’“ N J q o ,q l , *'‘ q k ] 



N! 

t — : 

n n : 

i=o 1 




k 

N. = 0, 1,2, * • *N N = N 

i Z_, i 

i=o 

Raiffa and Schlaifer^ suggest using a prior of the same functional 
form as the distribution of the sample. This procedure leads to prior 
(3) depicted in the last section. The advantages of this choice for a 
prior will become apparent in the succeeding development. 

Thus, 

m -1 m ^-1 “ k " 1 

p[q 0 ,q 1 -*-q k ] - G q Q q x ---q^. 

k 

v 

where / q. = 1 q ^ 0 

4-1 i 

i=o 

shall be the chosen prior. Making use of the usual fact that a density 
integrated over its range must equal one allows us to evaluate G. 



^Raiffa, H. and Schlaifer, R., Applied Statistical Decision Theory 
(Boston: Graduate School of Business Administration Harvard University, 
1961) p. 47-49. 



12 



Hence 



1 

G 



(1 



k id _ i m . - I 
^ o 1 



"•k" 1 



i=l 



q> 



dq 



dq 

k 



m^- 1 



= J-”J \ ' 



“l Vl 



m -l 
k-1 

q „ da • ♦ • dq 
k-1 H ! k-1 

k-1 

p 1 ^ m -1 m -1 

i-1 (1 - q.) ° q k dq 

J w 1 k k 



i=l 



k=o 



Now letting x = k-l” 



In 



i=l 

p r* m, +m — 1 in -1 

G ' J "'J <‘-IV k ° ' 

’l Vl 1-1 



ra 



k- 1 

• q, •» dq • • • dq 

H k-i \ \.i 



ra -1 



m - 1 



r k ° 

1 1 \ - V 



dx. 



k=o 



It may be noted that the last integral in the above expression is just 

a Beta integral with parameters m and m integrated over its range* 

iC o 

Hence: 



, l V 1 v l 
J \ (1 ■ V ■ B( Y 



o 
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By a continued iteration of the above procedure it may be shown that, 



p ** B (m , m ) B (m , in -hn ) B (m , m -hn -hn ) 

b k o k-1 o k k~2 o k k-1 



B(m. , m -hn. hm + ••• in ) 
i o * k-1 2 



Noting that 



B(m , m ) 
k o 



r (tn k ) r (m 0 ) 

T(m +m ) 
k o' 



G = 



r (m 4- m + • • • m, ) 
o 1 k' 



r(m ) P (m ) ••• r(m ) 

o 1 k 






‘\L 

-L = 0 



n r ( m > 



1=0 



Hence the complete prior is given by: 

r ^ 

r V^ m i 

(q 0 > qj_, q,.) = 



J 



- 1 = 0 - 



VV" «k 



k k 

n r (m ) 

i 



m „ i m-1 m -1 

o 1 k 

% q i ••• q k 



1=0 



k 

where ^ 



q = 1 and q SO i=0,l,**'k 



i=o 1 



Several properties of this prior will be needed at a later point; for 
convenience they will be derived or substantiated at this time. 



(a) Marginal Distribution of the q 's 

i 
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Consider k 

r(J 

\.L> 

£ (q n ) - 



ZO 



Q 1 k 

i n r (m ) 

i 

i=o 



P p m -1 m -1 

o 

J “M * \ 

O 1 

q 2 q k 



m -1 

k 

q dq . . . dq 

* 2 k 



Iterating this integral as in the previous case; 
k 

r V ^ 



r ( X m. ) 

v tw i y 



f (q ) = 

Q i 1 



_L=Q_ 



B(m , m )**• BOn^, m + m b +’ * ’+ 



k o 



2 o k 



n r (m^) 

i=o 



n^-l “ 0 ^+ *•* m 3 +m 2 " 1 



X q x " d-q L ) 



v 

Letting ni « m 
L- 1 

i=o 



£ q ( V ■ 

1 



r (m) Y 1 „ x m ■ v 1 

q (1 - q x ) 



r ( mi ) r (m - » ) 1 



B(m^, m - m^) 



n ” 1 m - m -1 

1 1 ... x 1 

- q L (! - q x > 0 * qps 1 



otherwise 



» By symmetry arguments the i t * 1 marginal is of the same form, 
therefore; 



15 



f (q) = i 

Q 

^ B(m , m - m.) 



ra -1 m - - 1 

q 1 (1 - q) 



f (q, m , m - m.) 
B i x 



0 ^ q * 1 



Thus the marginals of the prior are Beta distributed with the appropriate 
parameters as shown. For this reason and for convenience in notation 
the prior itself will be referred to as a multivariate Beta. 



(b) Expected Value of the Q 's with Respect to the Prior Marginals 

i 



E(Q ) - j q f (q) dq 
i J Q 
o i 



m-m -1 



B(m 



1 f m < m ‘ m i 

— r — j q 1 (i - q) i 

, m - m.) 



dq 



Hence 



B(m+ 1, m - m ) 
i i 



m . 



E(Q.) - 

i B(m^, ra - m^) 



m 



(c) Variance of the Q 1 s 

i 



E(Q *) = 



£ B(m^, m “ m .) 



p m .+l m-m.-l 

j q 1 (i - q) 1 dq 



m i (m t + 1) 
m(l + m) 



16 



Thus 



Var(Q ) = E(Q 2 ) - E 2 (Q ) 

i i 



m . (m - m ) 
1 i 

m 2 (l + m) 



(d) Joint Density of Q , Q 

1 j 



m -1 m -1 m-m. 

r (m) i j x 

f „ (q.> q.) ■ q, q. ( 1 - q.-q.) 

* j T (m^) r (m.) r (m-m^-m ) * J ^ J 



q+q^l q^O q^O 

i j i J 



(e) Covariance of Q , Q 

i j 



1 1-q. 



E(Q.Q ) = 



r(ra) 



1 j r (m.) F (m ) r (m-m -m.) q 
1 j i J 



j * Dlj 



J J q. 1 q i j 

q 1 J 
i=o j=o 



(1 - q.-q ) 

i J 



. *1 



J dq dq 



Let 



x = 



1 - q. 



Then integrating as before 



m 






E(Q Q.) * 



i J 



i j m (1 4* m) 



-m . -1 
J 
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And 



Cov (Q Q.) - E(Q Q.) - E(Q.) E(Q ) 
1 J i J 1 j 



m m . 
i J 



m (1 + m) 






m 



ram 



L.j 

m^(l + m) 
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4. POSTERIOR DISTRIBUTION OF THE PARAMETERS 



In the process described previously the distribution of the sample 
was given by 



■tVY 




n: 



n 



V 



N N 1 
o 

q qi 

o 1 




Utilizing the prior described in section three and applying Bayes rule 
we get; 



Ptq 



N_+m "1 N i +m. “ 1 

° o 1 

q q i 

o i 



... 



N +m 
k k 



-1 



j * 



•*J \ 

\ 



N k -HT» k -l f 

... q [ 1- , q. 



N +m -1 
o o 



i=l 



dq^. . .dq^ 



Noting the similarity between this integral and that previously seen 
in section three gives 



Pl ’ q o'*' q J N o’* , *V l 



r (N + m) 

T (N +m )**»r (N. +m ) 
o o R k 



N -!-m - 1 N, +m. - 1 
o o k k 




where N 



k 

= y n 

u 1 
i=o 



k 

V 

and m = ® . 

L~i r 

i=o 



as before. 



Except for suitable changes in parameters this distribution can be 
recognized as the multivariate Beta of the previous section. This 
facility in handling the posterior is, of course, one of the reasons 
for choosing this particular distribution for a prior. Of course a 
prior must meet more requirements than just facility in use. The 
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multivariate Beta in question does satisfy the intuitive demands set 
forth earlier. A logical method for selecting the parameters of the 



prior (ra^s) will be given in Appendix I. The assumption of this prior 
is equivalent to a form of "linear squashing" described by Good^* in his 
work on the estimation of probabilities. 

As has been noted above the posterior distribution is itself a 
Beta, hence the properties of the posterior may be stated by analogy 
from the previous results. For future convenience these properties 
will be listed below. 



(a) Posterior Marginal 



f (q|N , • • ‘N, ) 



Q t 1 o* k B(N i +iu i , N+m - (N^HDj)) 



N 1+m -1 N-Hn-(N i -hn i )-l 

q (1-q) 



(b) Marginal Mean 



E(Q |n ,...n.) = 

i O K 



\ + m i 



N + m 



(c) 



Marginal Variance 



Var( Q In ,-N.) = 

i O K 



2 

(N4tn) (N +m ) - (N.+ m ) 
i i 1 i 

(N + m) ^ ( 1 + N + m) 



(d) 



Conditional Covariance of Q , Q 

V J 

(N^ +nO (Nj + m^) 

Cov(Q q.|n ,***N ) = - 

1 J ° (N 4m) 2 (1 + N + m) 



Good, I. J., The Estimation of Probabilities, an Essay on Modern 
Bayesian Methods (Cambridge: Research Monograph No. 30, The M.I.T. Press, 
1965) p. 24. 
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5. RELIABILITY 



Corcoran, Weingarten and Zehna [l] define the current measure of 
reliability, p*, that was stated in section one of this paper. The 
estimation problem is considered to have taken place prior to the 
observation of the N tests. Averaging over all possible outcomes of 
the N tests to produce the ‘Wan reliability 11 is justified on this basis. 
Larson [ 3 ] regards this of interest in the early phases of development 
before testing can take place but points out that the final reliability 
is actually a function of the outcomes of the N tests. This is true 
because whether or not a corrective attempt is made depends on observing 
the given failure mode. For this reason he developes the conditional 
distribution of true reliability (conditioned on the outcomes of the 
N tests) and shows that p* is in fact the mean of this conditional 
distribution. This poses the question of how to properly interpret E(p*) 
Since p* is the mean of the conditional reliability it would seem 
unreasonable to average over all possible outcomes of the N tests (some 
of which are known not to have occurred) to obtain E(p*). Larson 
considers it more reasonable to attempt to estimate p* rather than E(p*). 

There would seem to be other avenues of approach open. Consider 
the original question of attempting to make probability statements about 
a complex system after some period of testing and applying corrective 
action. The discreteness of the conditional reliability function could 
be interpreted as an anomaly arising from simplification in the logical 
statement of the problem. The uncertainty in the actual reliability 
arises from two sources: the uncertainty in the failure modes themselves 

and the uncertainty of correcting a type of failure if it occurs. In 
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some cases it may be feasible to treat the uncertainty in the failure 

modes by assuming a prior distribution on the q 's. Partial justi- 

i 

fication for this procedure was given in section two. 

It does appear attractive to treat the reliability as conditioned 

on the outcomes of the N tests. At least at that point in time we know 

which modes of failure it is possible to remove. With the above 

considerations in mind the following scheme is proposed. Let N tests 

be conducted, the results of which are N successes and N, failures in 

o i 

the respective failure modes, i = 1,2,*** k. The probability of a 

success is q , the probability of the failure of the 1 mode is q , 
o i 

where 

k k 

Yq=l ) N = N 

L> i Li. 

i=o i= 0 



The maximum number of failure modes that can occur is k; it is not 

necessary that each mode be observed however, and a sample will in 

general demonstrate less than k failure types. If we let a be the 

i 

probability of correcting the i^ failure mode given that it is observed 
and b^ be the unconditional probability of correcting each failure mode 
it follows that 

0 N. = 0 

l 

b. mJ 
1 

a N > 0 
i i 

The following is stated as an intuitive measure of the reliability 
after corrective action. This is of course the same measure defined 
by Corcoran, Weingarten and Zehna [ 1] . The interpretation given to 
it here, however, differs from that of the above paper. The actual 
sampling is considered to have taken place and R is conditioned on this 
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outcome through the b 1 s 



i 



k 



R ■ « + V bo 



o L < i i 

i=l 



k k 




i=l i=l 



k 



= 1 + N (b - 1) a 




If we are willing to assume a distribution on the q^'s then R is 
a function of random variables and is hence itself a random variable. 
The remainder of this study will be devoted to the development of the 
properties of this measure of reliability with respect to the posterior 
distribution developed in section four. R is t e weighted sum of 
dependent Beta distributed random variables. It is not possible to 
obtain the convolution of independent Beta's, hence there is little or 
no hope of deriving the distribution of R. 



Now 



k 




i=l 



Noting from section four 



E(q ) - 
1 




N + m 



E(R) - 1 + 



(b.-l) 




i=l 



N + ra 
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Using the variance and covariance formulation developed in section four, 
let 



a 

ij 



(N. + m.)(N. + m ) 

i J j 

2 

(N + m) (N + m + 1) 



a 

ii 




(N + m) 2 (N + m + 1) 



Now let V equal the matrix 

i m 1,2, ••• k 
j * 1,2, • • . k 

and a/ equal the vector 




(yi, b 2 -l, •••, b k -l) 

Then the variance of R is the quadratic form 
Var(R) = a' V a 

Thus although the distribution of R cannot be shown its mean and 

variance are known. These parameters are functions of the N 1 s and 

i 

hence are uniquely determined for each sample point. In an informal 
sense the mean and variance of a random variable can be said to "estimate" 
the random variable and hence E(R) provides some indication of the 
corrected value of the reliability. In the above case it may be noted 
that the variance of R tends to zero as N tends to infinity; hence the 
density of R "lumps" at the mean as N increases. In this sense E(R) 
may be said to asymptotically estimate R as N increases. This fact may 
be of little use however since the N*s of interest can be expected to 
be small. 
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It is feasible that the higher moments of the distribution could 
be obtained. While these moments may be complex they should be functions 
of the N's and hence capable of being evaluated* Obtaining all of the 
moments would of course be equivalent to obtaining the actual distri- 
bution. 

One last avenue of approach may be worthy of mention. It was 
initially hoped that the formulation in this study would lead to the 
ability to make confidence interval statements about the reliability. 
Except for the loose bound obtainable through Chebychev's inequality 
this has not been realized. The question of the existence of some 
limit theorem has not been thoroughly investigated. This does not 
appear promising but it may warrant some further consideration. 
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APPENDIX I 



SELECTION OF PRIOR PARAMETERS 

One possible method of selecting the parameters of a prior 

distribution would be to fit a set of prior means and variances to 

the distribution. With the multivariate Beta in question however 

there are insufficient parameters (k + 1) to do this. The following 

method suggested by Silver* is given. 

We use the last k parameters to fit the means, then the remaining 

parameter, m , is used to obtain a least squares fit to the prior 
o 

variances. 

Let i. 

( 1 ) 



m 






i=o 



Then 



( 2 ) 



m 



E(q ) - 

l m 



Var(q ) 



m.(m - m.) 

l i_ 

m (1 + m) 



i = 1,2, * * *k 
i = 1, 2, • • *k 



Substituting equation (1) into equation (2) 



E(q.) [1 - E (q.) ] 

l l 



Var(q ) = 

l 



m 



+ 1 



Let Var(q^) be the prior estimated variance of q^. 



1 

Silver, E. A., Markovian Decisions Processes with Uncertain 
Transition Probabilities or Rewards (M.I.T. , Interim Technical 
Report No. 1, 1963) Appendix C, pp. 177-179. 
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Hence we want to choose m to minimize 



k 

SS = Y (Var(q^) - Var^)) 2 
i=l 

k 

SS » ^ Var(q ) - 

itl 

Setting d(SS)/dm equal to zero we get the following for a lea6t squares 
value of in equal to ra. 



E(q t ) [1 - ECq^] 



ra + 1 




and hence 



ra 

i 



m E(q^) 



i = 1,2, •* * k 



1 
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